The locking policy introduced in this paper essentially solves the problem of hanging and crashed processes for cfengine, using a minimum of system resources. Although the simplicity of the algorithm could make the autonomous garbage collection procedure inappropriate for certain programs, in most cases of interest to system administrators, the behaviour is sensible and correct. The principal advantage of these locks is that one can always be confident that the system will not seize up; the flow of updates remains in motion.
How do system administrators use these locks in practice? The simplest
way, which is completely transparent, is to use cfengine as a
front-end for starting all scripts. The has several advantages, since
cfengine provides a powerful classing engine which can be used to make
a single net-wide cron file. Cfengine can do many things, but it is
valuable even solely as a script scheduler. The alternative to this
is to implement the locks in Perl or shell or some other scripting
language. This is easily accomplished since the locks use only files
(echo >> file
) and inodes (touch file
). Time comparisons
are harder in the shell, but not insurmountable. Languages like Perl
and Guile/scheme should implement the locks as a library module.
One minor problem we have run into occurs with programs which are started through calls to rsh. In this case, the rsh process does not always terminate, even when the process started by rsh has exited. If such a program is killed, when a lock expires, processes will not necessarily die in the intended fashion. Thus while the new instantiation of the program may continue to restart the entire task anew, this can leave hanging processes from the ostensibly-killed instantiation, which simply clutter up the process table. A possible solution would be to kill the entire process group for the rsh, but this method is not completely portable. This is presently a teething problem to be solved.
Adaptive locks contribute an insignificant amount of time to the total runtime in trials with cfengine and conceal the occurrence of spurious messages associated with the locks. Our locks are simple to implement and may be used in any program where one has atomic operations whose order need not be serialized into any strong order. An added side effect is that programs become effectively re-entrant to multiple threads.
It would make a fascinating study to determine whether the intelligence of a program like cfengine could be extended to encompass learning with respect to the jobs its carries out. Could, for instance, the values of IfElapsed and ExpireAfter be tuned automatically from the collective experience of the system itself? For example, an atom which is frequently killed could be allowed more time to complete. Conversely, programs which are started at every IfElapsed interval could indicate an attempt to spam the system, and measures could be taken to warn about or restrict the use of that atom. It is surprising how many interesting issues can be attached to such a simple idea as the adaptive lock and we hope to return to some of them in the future, as part of our programme of research into self-maintaining operating systems.